
We Claim 

1. A method for determining a probability for one or more states for a nucleotide in a nucleic 
acid sequence, comprising: 

a) determining an initial oligonucleotide probability for each of said states for an initial 
oligonucleotide in said nucleic acid sequence; 

b) determining transition probabilities for each of said states for nucleotides within said 
nucleic acid sequence following said initial oligonucleotide; 

c) determining a probability for said nucleic acid sequence for each of said states; and, 

d) determining a probability for each of said states for said nucleotide based upon said 
probability of said nucleic acid sequence and a bias. 



2. The method of claim 1, wherein said probability for each of said states for said nucleotide is 
determined using an inhomogeneous Markov model having eight states, wherein said eight states 
are: first reading frame positive strand (1+); second reading frame positive strand (2+); third 
reading fi-ame positive strand (3+); first reading frame negative strand (1-); second reading frame 
negative strand (2-); third reading frame negative strand (3-); noncoding positive strand (N+); 
and, noncoding negative strand (N-). 

3. The method of claim 2, wherein said probability for each of said eight states for said 
nucleotide in step e) is determined using the equation 

p(f\s) = m-Pf'Pf(s) 

E M) • Pi ■ Pi{S) 

i€{l+,2+,3+.iV+,l-,2-,3-,N-} 



4. The method of claim 1, wherein said nucleotide is the middle nucleotide in said nucleic acid 
sequence. 
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5. The method of claim 1, wherein said nucleic acid sequence is part of a longer nucleic acid 
sequence. 

6. The method of claim 1 , wherein said bias is between 0.0 and 0.9 or greater than 1.1. 

7. A method for determining a probability for one or more states for a nucleotide in a nucleic 
acid sequence, comprising: 

a) determining an initial oligonucleotide probability for each of said states for an initial 
oligonucleotide in said nucleic acid sequence; 

b) determining transition probabilities for each of said states for nucleotides within said 
nucleic acid sequence following said initial oligonucleotide; 

c) determining a probability for said nucleic acid sequence for each of said states; and, 

d) determining a probability for each of said states for said nucleotide based upon said 
probability of said nucleic acid sequence, wherein said determining a probability for each of said 
states is capable of accepting a bias. 

8. A method for determining a probability for each of one or more states for more than one 
nucleotide in a nucleic acid sequence comprising: 

a) determining an initial oligonucleotide probability for each of said states for an initial 
oligonucleotide in a window of a first nucleotide; 

b) determining transition probabilities for each of said states for nucleotides within said 
window following said initial oligonucleotide; 

c) determining a probability for said window for each of said states; 

d) determining a probability for each of said states for said nucleotide based upon said 
probability for said window and a bias; and, 

e) repeating steps a) through d) for each remaining nucleotide in said nucleic acid 
sequence. 

9. The method of claim 8, wherein said more than one nucleotide are contiguous, and step e) is 
performed sequentially from said first nucleotide to a last nucleotide. 
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1 0. The method of claim 9, wherein said probability for each of said states for said more than 
one nucleotide is determined using an inhomogeneous Markov model having eight states, 
wherein said eight states are: first reading frame positive strand (1+); second reading frame 
positive strand (2+); third reading frame positive strand (3+); first reading frame negative strand 
(1-); second reading frame negative strand (2-); third reading frame negative strand (3-); 
noncoding positive strand (N+); and, noncoding negative strand (N-). 

11. The method of claim 10, wherein said probability for each of said states for said more than 
one nucleotide is determined using the equation 

p(f\s) = m-PfPfis] 

E M) • Pi • Pi{S) 

i€{l+,2+,3+ ,Ar+ ,1 - ,2- ,3- , AT- } 

12. The method of claim 8, wherein said nucleic acid sequence is part of a longer nucleic acid 
sequence. 

13. The method of claim 8, wherein each nucleotide in said more than one nucleotide is the 
middle nucleotide in its own window. 

14. The method of claim 8, further comprising: 

f) extending said nucleic acid sequence if said window extends beyond either end of said 
nucleic acid sequence, wherein said extending is accomplished by copying nucleotides from an 
end of said nucleic acid sequence at which said window is located to produce a copied nucleotide 
sequence, and adding said copied nucleotide sequence to said end. 

15. The method of claii^ wherein said window has a length of about 75 to about 125. 

1 6. The method of claim 8, wherein said bias is between 0.0 and 0.9 or greater than 1.1. 
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1 7. A method for determining strand coding of a nucleic acid sequence, comprising: 

a) determining a ferobability of each of one or more states for each nucleotide in said 
nucleic acid sequence basSd upon a bias, wherein each of said states is either a positive strand 
state or a negative strand stke; 

b) summing said probabilities of said positive strand states for each of said nucleotides to 
produce a sum of probabilitiesYor positive states; 

c) summing said probabilities of said negative strand states for each of said nucleotides to 
produce a sum of probabilities for,^negative states; and, 

d) deciding one of \ 

i) coding is mixed of not detectable if a first function of said sum of probabilities 
for positive states and said sum of probabilities for negative states is less than a threshold 
value; , 

ii) coding is on said positive strand if a second function of said sum of 
probabilities for positive states is greater than a third function of said sum of probabilities 
for negative states and said first function is not less than said threshold value; and 

iii) coding is on said negative strand if said second function of said sum of 
probabilities for positive states is not greater thans3id1hird function of said sum of 
probabilities for negati\^states and saidfLrsffmiction is not less than said threshold 
value. 



1 8. The method of claim 1 7, wherein said sum of probabilities for positive states is X, said 
of probabilities for negative states is Y and said first function is f(X Y) = 

{x+y)' 

19. The method of claim 18, wherein said threshold value is from about 0.4 to about 0.6. 

20. The method of Claim 17, wherein said sum of probabilities for positive states is X, said 
of probabilities for negative states is Y, said second function is f(X)=X, and said third function 
f(Y)=Y. \ 



sum 



sum 
is 
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2 1 . The method of claim 1 TAwherein step a) comprises: 

e) determining an initi^ oligonucleotide probability for each of said states for an initial 
oligonucleotide in a window of k first nucleotide; 

f) determining transition ^obabilities for each of said states for nucleotides within said 
window following said initial olig^ucleotide; 

g) determining a probability tbr said window for each of said states; 

h) determining a probability foAeach of said states for said nucleotide based upon said 
probability for said window and a bias; and, 

\ 

i) repeatmg steps e) through h) for^each remaining nucleotide in said nucleic acid 
sequence. 

22. A method for determining the extent of an open reading frame within a nucleic acid 
sequence, comprising: ^ 

a) determining the probability of each4f one or more states for each nucleotide in said 
nucleic acid sequence based upon a bias, wherein each of said states is either a coding state or a 
noncoding state; / \ 

b) determining the coding stran'd of said nucleic acid sequence; and, 

c) determining the points withiii said nucleic acid sequence^ihsaid coding strand at which 
the sum of the probabilities of said codi^states for each nucleotide drops below a first threshold 
value for a number of nucleotides greater thatTTsecon^threshold value, wherein ends of said 
open reading frame are indicated at said points. 

23. The method of claim 22, wherein said first threshold value is about 0.4 to about 0.6. 

24. The method of claim 22, wherein said second threshold\value is about 500 to about 700. 

25. The method of claim 22, wherein step c) comprises: 

d) determining the sum of said coding states for a middlV nucleotide located in said 
nucleic acid sequence; 
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e) repeating step d) seqile^tially for nucleotides located on a first side of said middle 
nucleotide until either 

i) the sum of the prkabilities of said coding states drops below said first 
threshold value for a numb^ of nucleotides greater than said second threshold value, or 

ii) an end of said nuclteic acid sequence is reached, 

at which point an end df the open reading fi-ame is indicated; and, 

f) repeating step e) for nucleotid^ located on a second side of said middle nucleotide. 

26. The method of claim 22, wherein saidynucleic acid sequence is part of a longer nucleic acid 
sequence. 



27. The method of claim 22, wherein step b) comprises 

d) summing probabilities of positive str^d states for each of said nucleotides to produce 
a sum of probabilities for positive states; 



e) summing probabilities of negative strand\staites for each of said nucleotides to produce 
a sum of probabilities for negative states; and, 

f) deciding one of 

i) coding is mixed or not dete/able if aVirst fimction of'^aid sum of probabilities 
for positive states and said sum of probabilities f\r nega^i^^tates is less tiian a threshold 
value: 




ii) coding is on said positive strand if a second function of said sum of 
probabilities for positive states is greater than a thirdVunction of said sum of probabilities 
for negative states and said first fimction is not less than said threshold value; and 

iii) coding is on said negative strand if said second function of said sum of 
probabilities for positive states is not greater than said thi\d fiinction of said sum of 
probabilities for negative states and said first fxmction is n\less than said threshold 
value. 

28. The method of claim 22, wherein step a) comprises: 
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d) determining ^an initial oligonucleotide probability for each of said states for an initial 
oligonucleotide in a windW of a first nucleotide; 

e) determining transition probabilities for each of said states for nucleotides within said 
window following said initial oligonucleotide; 

f) determining a probability for said window for each of said states; 

g) determining a probability for each of said states for said nucleotide based upon said 
probability for said window and kbias; and, 

h) repeating steps d) through g) for each remaining nucleotide in said nucleic acid 
sequence. 

29. A method for determining the location of insertions and deletions within a nucleic acid 
sequence, comprising: \ 

a) determining the probability of each of one or more states for each nucleotide in said 
nucleic acid sequence based upon a bias, wh^in each of said states is either a coding state or a 
noncoding state; 

b) setting a length for a window; 

c) determining which state has a maximumViiean probability for said nucleic acid 
sequence on a first side of a middle nucleotide in said window, wherein said window begins at a 
first nucleotide; ^ \ 

d) determining which state has a maximum mean probability for said nucleic acid 
sequence on a second side of said middle nulleotide in said v 

e) determining that a deletion or inser^fen-occurfed^^ said middle nucleotide if 

i) said state with said maximum mean probability on said first side of said middle 
nucleotide is different from said state with said maximuiji mean probability on said 
second side of middle nucleotide, and 

ii) either an average of hypothetical state probabiliti^ for said window with an 
insertion at said middle nucleotide or an average of hypothetical state probabilities for 
said window with a deletion at said middle nucleotide is greater t^ian a sum of said 
middle nucleotide's coding states probabilities; and, 
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f) repeating steps cVthrough e) for each remaining nucleotide in said nucleic acid 
sequence after said first nucleotide, wherein said window begins at each remaining nucleotide m 
turn. 

30. The method of claim 29, fuMier comprising: 

g) determining that a deletion occurred if said average of hypothetical state probabilities 
for said window with an insertion afvsaid middle nucleotide is greater than an average 
hypothetical state probabilities for saTd window with a deletion at said middle nucleotide or that 
an insertion occurred if said average hj^hetical state probabilities for said window with an 
insertion at said middle nucleotide is/ol greater than an average of hypothetical state 
probabilities for said window with /deletion at said/iniddle nucleotide. 

3 1 . The method of claim 29, whereVsakTliCcleic acid sequence is part of a longer nucleic acid 
sequence. 

32. The method of claim 29, wherein said repeating in step f) is performed sequentially from 
said first nucleotide to a last nucleotide. \ 

r \ 

^3. The method of^im 29, w^^in^i^ window is about 75 to about 125. 

34. The method of claim 29, whemin step a) comprises: 

g) determining an initial oligbnucleotide probability for each of said states for an initial 
oligonucleotide in a window of a firstViucleotide; 

h) determining transition probabilities for each of said states for nucleotides within said 
window following said initial oligonud^ptide; 

i) determining a probability for ; 
j) determining a probability fori 

probability for said window and a bias; and, 

k) repeating steps g) through j) for eac^remaining nucleotide in said nucleic acid 
sequence. 



said windowyfor each of said states; 
^ach^f said/states for said nucleotide based upon said 



72 



35. A method for determining exon location within a nucleic acid sequence, comprising 

a) determining the prob^ibility of each of one or more states for each nucleotide in said 
nucleic acid sequence based upoh a bias, wherein each of said states is either a coding state or 
noncoding state; 

b) determining the coding ^and of said nucleic acid sequence; 

c) determining the extent of an open reading frame within said nucleic acid sequence; 

d) classifying each nucleotideiin a coding class or a noncoding class based on a most 
probable state for said coding strand; \ 

e) reclassifying each nucleotideWcording to defined rules; and, 

f) determining that regions of said nucleic acid sequence in said coding class are exons. 

fU \ 

I2 The method ofclaim 35, wherein step^><5omprises: 

^ g) reclassifying a noncoding nucleotide to a class of an adjacent nucleotide on a first side 

M of said noncoding nucleotide and an'adjaceht nucleotide on a second side of said noncoding 



■ 5=i 



n 



nucleotide if said adjacent nucleotide on said first side and said adjacent nucleotide on said 
second side all are of a single class; 




h) reclassifying a nucl^ide to a classof>v<^adjacent nucleotides on a first side and two 
adjacent nucleotides on a secondsiaeifSai^hvo adjacent nucleotides on said first side and said 
two adjacent nucleotides on said second side all are of a single class; 

i) reclassifying a first pair of adjacent nucleotides having a same class to a class of two 
adjacent nucleotides on a first side of said firstWir and two adjacent nucleotides on a second 
side of said first pair if said two adjacent nucleotides on said first side and said two adjacent 
nucleotides on said second side all are of a singlte class; 

j) reclassifying a second pair of adjacent nucleotides having a same class to a class of an 
adjacent nucleotide on a first side of said second pair and an adjacent nucleotide on a second side 
of said second pair if said adjacent nucleotide on skid first side and said adjacent nucleotide on 
said second side both are of a single class; \ 

k) reclassifying a nucleotide to a class of an adjacent nucleotide on a first side of said 
single nucleotide and an adjacent nucleotide on a second side of said nucleotide if said adjacent 

\ 
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1i J# 

nucleotide on said first sid^and said adjacent nucleotide on said second side both are of a single 
class; 

1) reclassifying a contiguous sequence of less than a defined minimum number of 
nucleotides in a noncoding class having nucleotides in a coding class on both sides to a coding 
class of flanking nucleotides; atid, 

\ 

m) reclassifymg a codmg segment comprising more than one class of nucleotides to a 
most common class in said segment. 



37. The method of claim 35, whereiristep b) comprises: 

g) summing probabilities of positive strand states for each of said nucleotides to produce 
a sum of probabilities for positive states; 

h) summing probabilities of negAive-stcand states for each of said nucleotides to produce 
a sum of probabilities for negative states; and, 

i) deciding one of / \ 

I) coding is mixed or not detectable if a first fimction of said sum of probabilities 
for positive states and said si^n of pro^babiljjie^r negative states is less than a threshold 
value; 

II) coding is on said positive strand if a second fiinction of said sum of 
probabilities for positive states is greatei- than a third fiinction of said sum of probabilities 
for negative states and said first fiinction ;is not less than said threshold value; and 

III) coding is on said negative strand if said second function of said sum of 
probabilities for positive states is not greater than said third function of said sum of 
probabilities for negative states and said fir^t function is not less than said threshold 
value. \ 



38. The method of claim 35, wherein step c) comprises: 

g) determining the points within said nucleic acid sequence in said coding strand at which 
a sum of the probabilities of coding states for each nucteotide drops below a first threshold value 
for a number of nucleotides greater than a second threshold value, wherein ends of an open 
reading frame are indicated at said points. \ 

\ 
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39. The method of claim 35, wherein step a) comprises: 

g) determining an initial oligonucleotide probability for each of said states for an initial 
oligonucleotide in a window of a first nucleotide; 

h) determining transitix^n probabilities for each of said states for nucleotides within said 
window following said initial oligonuckotide; 

i) determining a probabilityfor said window for each of said states; 

j) determining a probability for each of said states for said nucleotide based upon said 
probability for said window and a bi^s; and, 

k) repeating steps g) tlLough j) for each'^maining nucleotide in said nucleic acid 
sequence. ^---V^ 

40. The method of claim 35, further comprisW: 

g) translating said exons to determine a protein sequence. 

41 . A method for determining a probability for one or more states for a nucleotide in a nucleic 
acid sequence, comprising determining a probability for each of said states for said nucleotide 
based upon a probability of said nucleic acid sequence and a bias. 

42. A method for determining a probability for each of one or more states for more than one 
nucleotide in a nucleic acid sequence comprising: 

a) determining a probability for each of said states for a first nucleotide in said nucleic 
acid sequence based upon a probability of a window in which said first nucleotide is located and 
a bias; and, 

b) repeating step a) for the remaining nucleotides in said nucleic acid sequence. 

43. A program storage device readable by a machine, tangibly embodying a program of 
instructions executable by a machine to perform method steps to determine a probability for each 
of one or more states for a nucleotide in a nucleic acid sequence, said method steps comprising: 
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a) determining an initial oligonucleotide probability for each of said states for an initial 
oligonucleotide in said nucleic acid sequence; 

b) determining transition probabilities for each of said states for nucleotides within said 
nucleic acid sequence following said initial oligonucleotide; 

c) determining a probability for said nucleic acid sequence for each of said states; and, 

d) determining a probability for each of said states for said nucleotide based upon said 
probability of said nucleic acid sequence and a bias. 

44. A program storage device readable by a machine, tangibly embodying a program of 
instructions executable by a machine to perform method steps to determine a probability for one 
or more states for more than one nucleotide in a nucleic acid sequence, said method steps 
comprising: 

a) determining an initial oligonucleotide probability for each of said states for an initial 
oligonucleotide in a window of a first nucleotide; 

b) determining transition probabilities for each of said states for nucleotides within said 
window following said initial oligonucleotide; 

c) determining a probability for said window for each of said states; 

d) determining a probability for each of said states for said nucleotide based upon said 
probability for said window and a bias; and, 

e) repeating steps a) through d) for each remaining nucleotide in said nucleic acid 
sequence. 
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